APPROXIMATING THE MOMENTS OF MARGINALS OF 
HIGH DIMENSIONAL DISTRIBUTIONS 



ROMAN VERSHYNIN 

Abstract. For probability distributions on M™, we study the optimal 
sample size N = N(n,p) that suffices to uniformly approximate the p- 
th moments of all one-dimensional marginals. Under the assumption that 
the marginals have bounded Ap moments, we obtain the optimal bound 
N = 0{n p / 2 ) for p > 2. This bound goes in the direction of bridging the 
two recent results: a theorem of Guedon and Rudelson 7 t which has an ex- 
tra logarithmic factor in the sample size, and a result of Adamczak, Litvak, 
Pajor and Tomczak-Jaegermann [lj which requires stronger subexponential 
moment assumptions. 



1. Introduction 

1.1. The estimation problem. We study the following problem: how well 
can one approximate one-dimensional marginals of a distribution on M. n by 
sampling? Consider a random vector X in MJ 1 , and suppose we would like 
to compute the p-th moments of the marginals (X, x) for all x e M. n . To this 
end, we sample N independent copies X\, . . . , Xjy of X, compute the empirical 
moment from that sample, and we hope that it gives a good approximation of 
the actual moment: 



i 1 N 

1.1) sup -^|(X i ,x)r-E|(X,x)r < 



Indeed, by the law of large numbers this quantity converges to zero as iV — » 
oo. To understand quantitative nature of this convergence one would like to 
estimate the optimal sample complexity N = N(n,p,e) for which (II. ip holds 
with high probability. For p = 2 this problem is equivalent to approximating 
the covariance matrix of X by a sample covariance matrix, and it was studied 
in IH QH OH El Q] . For p ^ 2, the problem was also studied in [5j U [121 E] • 
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A well known lower bound for the sample complexity is N > n for 1 < p < 2 
and N > n v l 2 for p > 2. 0. Guedon and M. Rudelson [7] proved the upper 
bound iV = OirtPl 2 logn) for p > 2 under quite weak moment assumptionso 

(1.2) \\X\\ 2 = 0{y/n) &.B., {E\(X,x)\ p ) 1/p = 0{1) for all x G S™" 1 . 

The logarithmic term can not be in general removed from the sample complex- 
ity; this can be seen by considering a random vector X uniformly distributed 
in a set of n orthogonal vectors of Euclidean norm yfn. On the other hand, 
R. Adamczak et al. pQ recently managed to remove the logarithmic term for 
random vectors X uniformly distributed in an isotropic convex body K in R™, 
showing that for such distributions one has N = 0(n) for 1 < p < 2 and 
N = 0{n p / 2 ) for p > 2. Their result actually holds for all random vectors X 
that satisfy the sub- exponential moment assumptions 

(1.3) ||A|| 2 = 0{^/n) a.s., (E\(X,x)\ q ) 1/q = O(q) for all q > 1 and x G S n ~\ 

A program aiming at understanding general empirical processes with sub- 
exponential tails is put forward by S. Mendelson [T2l IT3] . 

1.2. Distributions with finite moments: main result. At this moment 
there is no complete understanding which distributions on R n require loga- 
rithmic oversampling and which do not. Clearly there is a gap between the 
minimal moment assumptions (11. 2\\ of [7\ and the subexponential assumptions 
f ll .3D of [I]. The present note makes a step toward closing this gap. 

Distributions with tails heavier than exponential frequently arise in sta- 
tistics, economics, engineering and other exact sciences like geophysics and 
environment. Heavy-tailed distributions are frequently used to model data 
that exhibit large fluctuations - see e.g. [TU [91 12] and the references therein. 
A very basic theoretical example of a heavy-tailed random vector in R n is 
X = . . . ,£ n ) where are independent random variables with mean zero, 
unit variance and power-law tails P{|£j| > t} ~ t~ q for some fixed exponent 
q > 2 (e.g. normalized Pareto distrbution to mention a specific example). Such 
random vectors clearly satisfy E||X||| = n, thus ||X|| 2 = 0(y/n) with high 
probability. Moreover, the marg inals have moments (E\{X, x)\ q ') 1/q ' =0(1) 
for all q' < q, but the higher moments (for q' > q) are infinite. 

We shall show that a version of the result of Adamczak et al. [I] holds under 
finite moment assumptions for p ^ 2; specifically, the logarithmic oversampling 
is not needed if we replace p by 4p in the minimal moment assumptions (II. 2p . 
We shall first consider independent random vectors Aj in IR n that satisfy 

(1.4) ||Xi|| 2 < AVn a.s., (E\(X i7 x)\ q ) 1/q < L for all x G S 171-1 . 



The constant implicit in the 0{-) notation in the sample complexity N depends only on 
the constants implicit in the assumptions p. 21) : the same convention applies to other results. 
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Theorem 1.1 (Approximation of marginals). Let p > 2, e > and 5 > 0. 

Consider independent random vectors X 4 in M. n which satisfy (11.41) for q = 4p. 
Let N > Cn p ' 2 where C is a suitably large quantity that depends (polynomially) 
only on K, L,p, e, 5. Then with probability at least 1 — 5 one has 

N 



;i.5) snp ll^ito.sjp-EKx**)! 



xe sn-i i N — 



< £. 



Remark. 1. A more elaborate version of this result is Theorem 14.31 below. 
One can get more information on the probability in question using general 
concentration of measure results as is done in [7]. One can also modify the 
argument to deduce a version of this result "with high probability" in spirit of 
PQ, i.e. with probability converging to 1 (at polynomial rate) asn-} oo. 

2. A standard modification of the argument (as in [T]) gives an optimal 
result also in the range 1 < p < 2. Namely, if the random vectors satisfy f ll.4p 
for some q > 4p, q > 4, then the conclusion (II .5p holds for N > CK,L, P ,q,e,s n - 

3. The method of the present note does not seem to work for p = 2; this 
important and more difficult case is addressed in [18] with an oversampling by 
a possibly parasitic (loglogn) Cp - 9 factor. 

The argument of this paper also yields sharp bounds on the norms of ran- 
dom operators £ 2 — > £ p - The following result is a version of a result of [TJ 
Corollary 4.12] proved there under the stronger sub-exponential moment as- 
sumptions (11. 3p . 

Theorem 1.2 (Norms or random matrices). Let p > 2 and 5 > 0. Consider 
independent random vectors Xj in M. n which satisfy (11.41) for q = 4p. Then 
the N x n random matrix A with rows Xi, . . . , Xn satisfies with probability at 
least 1 — 5 that 

\\A\U 2 ^i p < C(n 1 ' 2 + N 1 ^) 
where C depends (polynomially) only on K,L,p,5. 

1.3. On the boundedness assumptions. Let us take a closer look on our 
assumptions (jl.4p on the distribution. The boundedness assumption ||Xj|| 2 = 
0(y/n) a.s. seems to be too strong - even the standard Gaussian distribution in 
MJ 1 does not satisfy it. We will observe that, although this assumption can not 
be formally dropped, it can be removed by slightly modifying the estimation 
process - discarding the the sample vectors Xi that do not satisfy it. 

First, it is easy to see that the boundedness assumption ||Xj|| 2 = 0(^/n) 
a.s. can not be dropped from our results. To this end, one easily constructs 
a random vector whose Euclidean norm has sufficiently heavy tail^l so that 



2 For example, one can achieve this by considering a version of a "multidimensional Pareto" 
distribution [TT] - the product of the standard Gaussian random vector in 1™ by an inde- 
pendent scalar random variable £ with a power-law tail. 
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maxi<7v || Xi || 2 3> \fn with high probability for N ^> 1 and, in particular, for 
the stated number of samples N ~ n p / 2 . Then the approximation inequality 
( II. 5p will fail. Indeed, once we choose x in the direction of the vector Xi with 
the largest Euclidean norm, we will have with high probability that | (Xi, x) \ p = 
H^Qlla ^ n p l 2 , which will force the average of the N terms in (II. 5p to be much 
larger than n p / 2 /N ~ const while E|(X;, x)\ p = 0(1). 

As a side note, the last observation also shows that the sample size N ~ n p l 2 
in Theorem 11.11 is optimal. 

Let us also note that the weaker boundedness assumption 
(1.6) (E||X 4 || 2 ') 1A? < Lt/E 

follows automatically from the second (moment) assumption in (II. 4p . To see 
this, we represent ||Xj||| = YTj=i^j where Zj = \(Xi,ej)\ 2 and where (ej 



is 



an ortho normal basis in W 1 . Then Minkowski inequality yields (II. 6p : 



(nxi\\i? /q = K$» ] ^ E ( EZ ff q = E in{x,e 3 )\^ < l 2 u. 
j=i j=i j=i 

Although as we noticed before the strong boundedness assumptions can not 
be dropped formally, it can be easily transferred into the estimation process. 
Instead of using all sample points Xi in the approximation inequality (II. 5ft . one 
can only use those with moderate norms, ||Aj||2 = 0(^/n). This will produce 
a similar approximation result without any boundedness assumption. Just the 
previous moment assumption will suffice: 

(1.7) (E| (Xi,x) \ q ) 1/q < L for all x G S n ~\ 

Corollary 1.3 (Approximation of marginals: no boundedness assumption). 

Let p > 2, e > 0, 5 > and K > 0. Consider independent random vectors 
Xi in W l which satisfy (11.71) for q = Ap. Let N > Cn p l 2 where C is a suitably 
large quantity that depends (polynomially) only on K,L,p,e,5. Denote 

I := {i < N : \\Xi\\ 2 < K^}. 

Then with probability at least 1 — 5 one has 



sup 



N 

iei 



<e + K p ~ q L q . 



Proof. Consider the events £i = {||Xj|| 2 < Ky/n}. The conclusion then follows 
by applying Theorem 11.11 to the random vectors Xj = Xl^., which clearly 
satisfy ( II .4p . Noting that \(Xi,x)\ p = \(Xi, x)\ p lg i , we obtain this way that 

N 



(1-8) sup |^|(X,a;)|^-E|(X,x)r 



< e. 
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To complete the proof, it remains to estimate the error 
(1.9) 

lEKx^r-EKx^rl =E\(X i ,x)\n e . < (E\(X i ,x)n p/q (¥(8!)) 1 ~ p/q 

where we used Holder's inequality. To estimate the probability of we use 
(11.61) which follows from our moment assumption (jl.7p as we noticed before. 
By Chebyshev's inequality we obtain 

F(£^ = {\\X l \\ 2 >K^}<(L/K) q . 

Using this and moment assumption (1 1.7ft we conclude that the error (11. 9ft is 
bounded by L p (L/K) q ^- p ^ = L q K p ~ q . Therefore in ((TSJ we can replace 
E|(Xj,x)| p by E|(Xj, x)\ p by increasing the error bound e by L q K p ~ q . This 
completes the proof. □ 

Remarks. 1. Of course one can achieve the approximation error 2e in Corol- 
lary [T3] by choosing the threshold K = K(L,e) sufficiently large. 

2. For some distributions one may be able to show that with high probability, 

(1.10) max||Xi|| 2 < Ky/n 

for some moderate value of K (ideally K = 0(1)) and for the desired sample 
size N. In this case, with high probability all events Si in Corollary 11.31 hold 
simultaneously, and therefore they can be dropped from the approximation 
inequality. One thus obtains the same bound as in Theorem 1 1 . 1 1 except for the 
extra error term K p ~ q L q . 

This situation occurs, for example, in the estimation result Adamczak et al. 
[1] mentioned above. For the uniform distribution on an isotropic convex body, 
the concentration theorem of G. Paouris [15] implies that P(||Xj||2 > Ky/n) < 
exp(— y/n). By union bound this implies that (11. 10ft holds with probability 
1 — N • exp(— y/n), which is almost 1 for sample sizes N growing linearly or 
polynomially in n. This is why in the final result of [1] for uniform distributions 
on convex bodies no boundedness assumption is needed, whereas for general 
subexponential distributions one needs the boundedness assumption ||Xj|| 2 = 
0(y/n) a.s. 

1.4. Heuristics of the proof of Theorem ll.il J. Bourgain [I] first demon- 
strated that proving deviation estimates like ( jl,5p reduces to bounding the 
contribution to the sum of the large coefficients - those for which | (X it x) | > B 
for a suitably large fixed level B. Such reduction is used in some of the later 
approaches to the problem [6j [T] as well as in the present note. However, after 
this reduction we use a different route. Suppose for some vector x G S 71 ' 1 there 
are s = s(B) large coefficients as above. The new ingredient of this note is a 
decoupling argument which is formalized in Proposition 12.11 It transports the 
vector x into the linear span of at most 0.01s of these X iy while approximately 
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retaining the largeness of the coefficients, \{Xi, x)\ > 5/4. Let us condition 
on these 0.01s random vectors Xi. On the one hand, we have reduced the 
"complexity" of the problem - our x now lies in a fixed O.Ols-dimensional 
subspace, which has an --net in the Euclidean metric of cardinality e 02s . On 
the other hand, the inequality |(Xj,x)| > B holds for the remaining 0.99s 
vectors Xi of which x is independent; by (11. 4p and Chebyshev's inequality 
this happens with probability (L/B) qs . Choosing the level B suitably large so 
that (L/B) qs < e-° ms allows us to take the union bound over the net, and 
therefore to control the contribution of the large coefficients. 

1.5. Organization of the paper. In Section [2] we develop the decoupling 
argument. We use it to control the contribution of the large coefficients in 
Section [3j This is formalized in Theorem 13.11 where we estimate the norm of 
a random matrix A with rows Xi in the operator norm £2 — > £2.00, and also 
in Lemma 14.21 In Section HI we deduce in a standard way the main results 
of this note - Theorem 11.11 on approximating the moments of marginals and 
Theorem 11.21 on the norms of random matrices £2 — > i p - 

In what follows, C and c will stand for positive absolute constants (suitably 
chosen); quantities that depend only on the parameters in question such as 
K, L,p, q will be denoted Cx,L, P ,g- 

Acknowledgement. The author is grateful to the referees for their thorough 
reading of the first two versions of this manuscript and for many suggestions, 
which greatly improved the presentation of this paper. 

2. Decoupling 

Proposition 2.1 (Decoupling). Let Xi, . . . ,X S be vectors in R™ which satisfy 
the following conditions for some K\, K2: 

(2.1) \\X k \\ 2 < K iy fc, - V (X u X k ) 2 < K*n, k = l,...,s. 

s z — ' 

i<s, i^k 

Let 5 G (0, 1) and let B > C^ Z I 2 K X , M > CS'^Kl/Ki. Assume that there 
exists x G S n ~ l such that 

(Xi, x) > B\fnfs + M, i = l,...,s. 

Then there exist a subset I C {1, . . . , s}, |/| > (1 — 5)s, and a vector y G 
S' 11 ^ 1 H span(Xj)j g / c such that 

(Xi,y) > ^(By/n/s + M), tel. 

Proof. Without loss of generality, we may assume that 5 > is smaller than a 
suitably chosen absolute constant (this can be done by suitably increasing the 
value of constant C). 
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Step 1: random selection. Denote a := B^n/s + M. Then 

(Xi/a,x) > 1, i = l,...,s. 

The convex hull K := conv{Xj/a, i = 1, . . . , s} is separated in M n from the 
origin by the hyperplane {u : (u,x) = 1}. By a separation argument, one can 
find a vector x G conv(i^ U 0), ||x|| 2 = 1 and such that 

(2.2) (Xi/a,x)>l, i = l,...,s. 

(Indeed, one chooses x = z/\\z\\2 where z is the element of K with the smallest 
Euclidean norm). We express x as a convex combination 

s s 

x = 2J X%Xi/a for some 

i=l i=l 

By Chebyshev's inequality, the set E := {i < s : Aj < 1/Ss} has cardinality 
> (1 — 8)s. We will perform a random selection on E. Let 81, . . . ,8 S be 
i.i.d. selectors, i.e. independent {0, 1} valued random variables with K5i = 5. 
We define the random vector 

y := ^^8i\iXi/a + 5XiXi/a; then Ey = 8x. 

Step 2: control of the norm and inner products. By independence and 
by definitions of a, E and B we have 

E||y - 5x\\\ = e|| ^ - 5)X l X t /a * = ^E(5 4 - 5) 2 ■ A 2 ^M 

K 2 n K 2 
<s8.{l/8sf - K l?L <^<0.16*. 
(B^n/s) 2 0B 2 

By Chebyshev's inequality, we have with probability at least 0.9 that 

(2.3) ||$7|| 2 < \\y-8x\\ 2 + \\8x\\ 2 < 25. 
Now fix k G -E. By definition of y and by (12.21) . we have 

(2.4) E(X k /a,y) = 5(X k /a,x)>5. 

We will need a similar bound with high probability rather than in expectation. 
More accurately, we would like to bound below 

Pk :=F{((l-8 k )X k /a,y)>8/2}. 

Consider the random vector y( k > obtained by removing from the sum defining 
y the term corresponding to X k : 

y(k) ._ 5 t \iXi/a + SXiXi/a = y- 5 k X k X k /a. 

ieE, i^k i£E c 
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Then is independent of 5 k , which gives 

p k = F{5 k = 0} ■ F{(X k /a, y^) > 5/2}. 

By definitions of a, E and B we can bound the contribution of the removed 
term as 

(X k /a, \ k X k /a) = A fc ^M < (l/6s) = ^ < 0.15 2 . 

a 2 (B^n/s) 2 5 B 

Then the random variable Z k := (X k /a, y^) satisfies by ( 12. 4p that 

EZ k = E(X k /a, y) - E(X k /a, 5 k \ k X k /a) > 5 - 0.15 3 > 0.95. 

Similarly to the argument in the beginning of Step 2, we obtain 

Var Z k = E(Z k -EZ t f = K(x„/a, YJ (4 - S)\ i X l /a 



i£E, ij^k 
2 



ieE, i^k 

fly Kins Kj 3 

" \6s) {B^fs + My- 5B*M 2 ~ ' ' 

By Chebyshev's inequality, we conclude that F{Z k > 5/2} > 1 — 5. We have 
shown that 

Pk>(l-5)(l-5)> 1-25. 

Step 3: decoupling. Denoting by £ k the event ((1 — 5 k )X k /a,y) > 5/2, we 
have shown that F(£ k ) > 1 — 25 for all k & E. Therefore with probability at 
least 0.9, at least (1 — 205) of the events £ k hold simultaneously. Indeed, 
by linearity of expectation we have 

keE keE 

By Chebyshev's inequality this yields 



P{ X) 1ft < (1 - 205)\E\} = P{ 1«S > m \ E \ } < 



2<5|£7| 1 



20<5|£| ~ 10 



k£E k£E 

We have shown that with probability at least 0.9 the following event occurs: 
there exists a subset I c E, \I\ > (1 — 22<5)s > (1 — 225)s, such that £ k holds 
for all fe G /. 

Assume the latter event occurs. By definition of £ k we clearly have 5 k = 
whenever £ k holds. Hence by definition of y one has y G span(Xj) ie / c . Also, 
by definition of £ k , one has 

(X k /a,y)>5/2, fee/. 
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Once we set y := y/\\y\\2, this and (12. 3 j) complete the proof. □ 

3. Norms of random operators £ 2 -> 4,oo 

Recall that the weak ^-norm || x|| 2,00 of a vector x = (x±, . . . ,xn) £ R^ is 
defined as the minimal number M for which the non-increasing rearrangement 
(x* k ) of the sequence satisfies x£ < Mk~ x l 2 , k = 1, . . . , N . It is well 

known that the quasi-norm || ■ || 2oo is equivalent to a norm on R w (see [17]). 
and one can easily check that c p ||x|| p < 2,00 < IMI2 f° r & U P > 2. 

Although || • || 2,00 is not a norm, for linear operators A : R n R w we will be 
interested in the "norm" ||A||^ 2 ^ 2 ^ defined as the minimal number M such 
that ||Ar|| 2 ,oo < M||x|| 2 for all x el". 

Theorem 3.1. Consider independent random vectors Xi, . . . , X^ in R n which 
satisfy fll.4[) for some q > 4. Then, for every t > 1, the random matrix A 
whose rows are Xi satisfies the following with probability at least 1 — Ct~°' 9q . 
For every index set I C {1, . . . , iV} ; one has 



\PiA\\i 2 -+e 2t00 < C KtLtq 



n + ty/\T\(N/\I\) 



2/g 



where Pi is the coordinate projection in M. N onto R 7 . In particular, one has 

WMh^ <C K ,L, q (V^ + tVN). 

Remarks. 1. This theorem is a finite-moment variant of Corollary 3.7 of PQ, 
where a similar result is proved under the stronger sub-exponential moment 
assumptions (II. 3p . The latter is in turn a strengthening of an inequlity of 
Bourgain jl] that has some unnecessary logarithmic terms. 

2. The conclusion of Theorem 13. II can be equivalently stated as follows. For 
every subset I C {1, . . . , iV}, one has 



5> 



iei 



< CR,L,q 

2 



f n\T\+t\I\ (N/\I\) 



2/g 



3. It seems possible that Theorem 13. II holds for the spectral norm ||A||£ 2 _^ a . 
This would imply that Theorem 13.11 holds in the important case p — 2. 

The proof of Theorem 13.11 is based on the Decoupling Proposition 12.11 So 
we will first need to verify the assumptions on the vectors (12. ip . 

Lemma 3.2. Let Z±, . . . Zn > be independent random variables which sat- 
isfy KZf < B q for some q > and some B. Consider the non-increasing 
rearrangement (Z*) of [Z^]. Then, for every t > 1, one has with probability at 
least 1 - Ct- q /N that 

(3.1) Z* < tB (N/t) 2/q , i = l,...,N. 
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In particular, for q > 4 (13.11) implies 

- J>*) 2 < C q t 2 B 2 (N/s)^, s = l,...,N. 
s i=i 

Proof. By homogeneity, we can assume that B = 1. Then by Chebyshev's 
inequality we have P{Zj > u} < u~ q for every j < N and u > 0. Now, if 
Z* > u then there exists a set J C {1, ... , iV}, | J\ = i such that \Zj\ > u for 
all j G J. Taking union bound over possible choices of the subsets J, using 
independence and Stirling's approximation, we obtain for all % — 1, . . . , N: 

¥{Z* >u}< (maxP-LZ,- > uff < ^J^"** < (eu~ q N/iy. 

Choosing u = t(eN/i) 2/q we obtain P{Z* > m} < (t~ q i/eN)\ Then, for t > 1, 

JV 

P{3i < N : Z* > u} < J2( rH / eN y < Ct~ q /N. 

i=i 

This easily implies the first part of the lemma. The second part follows by 
summation using that - Yli=% — C r s~ r for < r < 1; here r = 4/q. □ 

Lemma 3.3. Consider independent random vectors Xi, . . . ,Xn in R n which 
satisfy fll.4j) /or some g > 4. T/ien /or every t > 1 i/ie following holds with 
probability at least 1 — Ct~ q . For every subset E C {1, . . . , iV} and ever?/ k < N 
one has 

ill £ (X^X^^C/^L^iV/lED^V 

Proof. We fix A; < iV and apply Lemma 13.21 to the random variables Z^ : = 
\(Xi,X k )\, i < N, i ^ k. By assumptions (O)) . we have EZ? < (KL^n~) q . 
Then with probability at least 1 — Ct~ q /N, we have 

1 s 

- Y,(( z(k) )i) 2 < C q t 2 K 2 L 2 {N/s) 4/q n, s = 1, . . . , N. 
s i=i 

Taking union bound over k < N completes the proof. □ 

Proof of Theorem \3.1[ By homogeneity, we can assume that L = 1. Also, by 
decomposing I in three sets of roughly equal cardinality we see that it suffices 
to prove the conclusion for the subsets I of cardinality |J| < N/2. 

Denote by £ the event in the conclusion of Lemma 13.31 If £ holds, then 
the assumptions (12.11) of Decoupling Proposition 12.11 are satisfied for every s 
and every subset (Xi) ie E, E C {1, . . . , N}, \E\ = s, and with parameters 
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K\ = K, K\ = C q t 2 K 2 (N/s) 4 / q . So, in view of application of Decoupling 
Proposition 12. 1\ we consider B = B(K,S) and Mi = M\(q, 5, t) defined as 

B := C8~^ 2 K X , M = Cb- l KljK x = C'^'H {N/s) 2/q =: M 1 {N/s) 2/q . 

Note that we can assume that C > 8, which we will use later. 

We will now need a convenient interpretation the conclusion of the theorem. 
Given x £ S n ~ l , we denote by | (X^ , x) | a non- increasing rearrangement of 
the sequence \(Xi, x)\, i = 1, . . . ,N. Denote by D the minimal number such 
that for every x £ S 1 ™ -1 and every s < N/2 one has 



\(X As) ,x)\ < R s :=D[Bs/^fs + M 1 (N/s) 2 ^]. 

Since q > 4, the quantity y/s(N/s) 2 ^ q is non-decreasing in s. Therefore one 
has for every s < m < N/2: 

\(X w(sh x)\ < D[By^s + M 1 ^/^/7s (N/mf q ]. 

It follows that for every x £ S n ~ l , every m < N/2, and every index set 
I C {1, . . . , N}, \I\ = m, one has 

\\((Xi,x)) ieI \\ 2t00 < D[B v ^ + M 1 ^(N/m) 2 / q 

If we are able to show that D < 1 with the high probability as required in 
Theorem 13.14 this would clearly complete the proof. 

Since the event S holds with probability at least 1 — Ct~ q , it suffices to show 
that the event {£ and D > 1} occurs with probability at most Ct~°' 99q . Let 
us assume that the latter event does occur. By definition of D, one can find 
an integer s < N/2, a subset E C {1, . . . , iV}, \E\ = s, and a vector x £ S 1 " --1 
such that 

\(Xi,x)\>R a , ieE. 

By the definition of R s , B, M above, Decoupling Proposition 12 . 1 1 can be applied 
for (Xi)i & E, and it yields the following. There exists a decomposition E = JU J 
into disjoint sets I and J such that |/| > (1 — 5)s, \J\ < 5s, and there exists a 
vector y £ span(Xj) je j, ||y|| 2 = 1, such that 

(3.2) \{X,y)\>R s /A, i el. 

Let (3 = (3(5) > be a sufficiently small quantity to be determined later. 
Consider a /3-net Afj of the sphere S" 1 ^ 1 fl span(X,-)j g j. As in known by vol- 
umetric argument (see e.g. [H] Lemma 2.6), one can choose such a net with 
cardinality 

Wj\ < (3//3) |J| - 

We can assume that the random set Mj depends only on (3 and the random 
variables (Xj)j e j. There exists y £ Afj such that \\y — y \\ 2 < (3. By definition 
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of D, this implies that 

\(X n(m) ,y- y }\ < R\ss] -(3<R6 S -(3< {R S /V6)P = R s /8 

if we choose f3 = \/5/8. This means that all but at most 5s indices i in I 
satisfy the inequality \{X^y — yo)\ < i? s /8, and therefore (by (13. 2p ) also the 
inequality \(Xi,yo)\ > R s /8. Let us denote the set of these coefficients by Jo- 
Note that 

Rs/8 > \M 1 (N/s) 2/q (by definition of R s and since D > 1) 
8 

C 

> ^(t/5)(N/s) 2/q (by definition of M x ) 

8 

> (t/5)(N/s) 2/q (since C' q > 8). 

Summarizing, we have shown that the event {£ and D > 1} implies the 
following event that we call £q\ there exist an integer s < N/2, disjoint index 
subsets Jo = h( s ), J = J( s ) Q {lj • • • , N} with cardinalities |/o| > (1 — 25) s, 
\J\ < 5s, and a vector y G Afj such that 

\(X t ,y )\>(t/5)(N/s) 2 / q , iel . 

Note that by Chebyshev's inequality and independence, for a fixed yo G S n ~ 1 
and a fixed set io C {1, . . . , N} as above, one has 

H\{Xi }yo )\ > {t/5){N/sf' q , i e Io} < ((t/5)(iV/ s ) 2 /«)- 9|/o1 

(3-3) =((5/ty( S /Nf) lI ° l . 

Then we can bound the probability of £q by taking the union bound over all 
s,Iq,J as above, conditioning on the random variables (Xj)j e j (which fixes 
the net Afj), taking the union bound over y G Afj, and finally evaluating the 
probability using (13. 3p . This yields 

N/2 , v , v 

P ^o) < £ (^J Q |^| ((5A) 9 (^) 2 ) |/o1 

(recall that Jo and J in this sum may depend on s). Also recall that with our 
choice = y/5/8, we have \J\fj\ < (24/a/5)' j L Further, by our choice of Mi we 
have i? s /8 > 5~ 1 t (N/s) 2 / q . Using Stirling's approximation, we obtain 



p w^b(i) (- 1 1 ' 



AT/2 

/eiV (0\v ( 

Estimating s in the summand by 2|Jo| and using the inequalities |Jo| > (1— 25)s 
and \J\ < 5s along with monotonicity, we conclude for a sufficiently small 5 
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that 

/ /A\?<i\M) s /(7Af\ fc ^ /t~ q H \ (l-3<5)s 
s=l s=l 

This completes the proof of Theorem 13.11 □ 
4. Approximation of marginals and the £ 2 ->■ ^ p norms of random 

OPERATORS 

In this section we deduce from Theorem 13.11 the main results of this pa- 
per, Theorems 11.11 and Theorem 11.21 The method of this deduction is by now 
standard; it was used in particular in [TJ. It consists of an application of sym- 
metrization, truncation, and contraction principle, and it reduces the problem 
to estimating the contribution to the sum of large coefficients. 

Specifically, given a threshold B > and a vector x G S n , we define the set 
of large coefficients with respect to random vectors Xi, . . . , X N as 

E B = E B {x) = {i<N: \(Xi,x)\ > B}. 

The truncation argument in the beginning of proof of Proposition 4.4 in pQ 
yields the following bound: 

Lemma 4.1 (Reduction to the few large coefficients). Let p > 2, B > 0, 

t > 1. Consider independent random vectors Xi in W 1 which satisfy (11.71) for 
q = 2p. Then for every positive integer N, with probability at least 

1 - exp ( - CYmn{t 2 nB 2p - 2 ,tVN^i/B)) 

one has 

, N I 

(4.i) sup l-^iXuxW-mx^zW <mB^J- 

x( zs n 1 i=l V 

+ su p tj E k x ^)i p + su p e 4 E \( x ^ x )\ p 

T ecn-i iv — tC m- i iv ' * 

x€b idE B {x) x&b i&E B {x) 

where c = c Pt x > depends only on p and the parameter K in the moment 
assumption (11. 7\\ . □ 

This lemma reduces the approximation problem in Theorem II .11 to finding an 
upper bound on the contribution of the large coefficients ^2ieE B (x) 1 0^*! x ) \ P - 
In the following lemma, we observe that a slightly stronger bound (for the 
II • Ih.oo norm rather than || • || p norm) follows from Theorem 13.11 To facilitate 
the notation, throughout the end of this section we will write a < b if a < 

C K ,L,p,q,8 b. 
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Lemma 4.2 (Large coefficients). Let q > 4, t > 1, e G (0, 1) and B > 

t(eN/n) 2 ^ q ~ 4l \ Consider independent random vectors Xi, . . . , in W 1 which 
satisfy (ll.4p . Then with probability at least 1 — Ct~°- 9q , one has for every 
x G S n ~ l : 



\E B \ < t 2 n/eB\ \\{{X h x)) ieEB \\ 2 ^ < t^fe. 

Proof. By definition of the set Eb and the norm || • || 2 oo and using Theorem 13. 11 
we obtain with the required probability: 

(4.2) B 2 \E B \ < MiXuxftizEJln < n + t 2 \E B \ (N/\E B \) A/q . 

It follows that \E B \ < n/B 2 + N(t/B) q/2 . This and the assumption on B 
implies that \E B \ < t 2 n/eB 2 as required. Substituting this estimate into the 
second inequality in (14. 2p . we complete the proof. □ 

Proposition 4.3 (Deviation). Let p > 2, e G (0, 1), 5 > and N > n/e + C 

where C = C Pj k,s is suitably large. Consider independent random vectors Xi 
in M. n which satisfy (ll.4p for q = 4p. Then with probability at least 1 — 5 one 
has 

( 4 - 3 ) supJlf^x^r-EKx^r < e i/2 + + (^) 3/2 ! . 



x&S' 



n-l 



N 

i=l 



Remarks. 1. Theorem II. II follows immediately from this result. 

2. One could of course optimize the right hand side in e; we did not do this 
in order to make clear where the three terms come from. 

Proof. We choose B := t(eN/n) 2 ^ q ~ 4 ^ so that Lemma I4T21 holds. 

Next, we choose t = t(5, K) and C = C Pj k,s sufficiently large so that the 
probabilities in Lemmas 14.11 and 14.21 are at least 1—5/2 each. This is indeed 
possible for the probability in Lemma 14.11 as one can check that t 2 nB 2p - 2 = 
t 2p eN > t 2 P and tVN^/B > N l l 2 ~ 2 l (q ~^ > (?(p-2)/2(p-i) ; f or t h e probability 
in Lemma 14.21 this is straightforward. 

Let us assume that the conclusions of both these lemmas hold; as we now 
know this holds with probability at least 1 — 5. Our goal is to estimate the 
three terms in the right hand side of (14. ip . 

By our choice of B, the first term in the right hand side of ( 14. ip is < e 1 / 2 as 
required. The second term can be bounded using Lemma |4T2"1 Since || • || p < 
II • Ikoo for p > 2, we obtain that 



SU P ^ E ^ X )\ P ^ h\{{Xux))iZE B \\i,oo < 



[n/e)P/ 2 



as required. To compute the third term in the right hand side of (14.11) . consider 
for a fixed x the random variable Zi = \ {Xi,x)\. Since EZ? < L q , we have 

EZfl {Zi > B} < EZfiZt/By-n^B} < EZf/B q - p < L q B p ~ q . 
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Therefore, by our choice of B, we have 

1 1 - 

sup E-£) \(X ti x)\'= sup - J2^Zfl {z > B} 



i&E B xto i=l 



<L«B^<(^-)^ <m 3/2 
~ VeiVV - \eNJ 

Combining these estimates, we complete the proof. □ 

Remark. Theorem 11.21 now follows easily. We can assume that N > C where 
C = C Pt x,s is suitably large. Now, for N < n this result follows from The- 
orem EH] since ||^4||^ 2 ^£ p < 11^.11^2^2 00- For N > n, the result follows from 
Proposition S3] with e = 1, noting that (E\(Xi,x)\ p ) 1/p < L as p < q. 
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